Clustering Protein Sequences with Tailored General Regression Model Technique

نویسندگان

G. Lavanya Devi

Allam Appa Rao

A. Damodaram

GR Sridhar

G. Jaya Suma

چکیده

Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3”, etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR, to tell the degree of linearity of multiple sequences without having to compare each pair of them. Keywords—Clustering, General Regression Model, Protein Sequences, Similarity Measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Signal processing approaches as novel tools for the clustering of N-acetyl-Î²-D-glucosaminidases

Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...

متن کامل

Curve Clustering with Random Effects Regression Mixtures

In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals. The focus is to model curve data directly using a set of model-based curve clustering algorithms referred to as mixtures of regressions or regression mixtures. The proposed methodology is based on extension to regression mixtures that we call random effects regressi...

متن کامل

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters&#10 &#10In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy cluster...

متن کامل

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy clustering ...

متن کامل

Clustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information

Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Clustering Protein Sequences with Tailored General Regression Model Technique

نویسندگان

چکیده

منابع مشابه

Signal processing approaches as novel tools for the clustering of N-acetyl-Î²-D-glucosaminidases

Curve Clustering with Random Effects Regression Mixtures

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Clustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information

عنوان ژورنال:

اشتراک گذاری